AITopics | belief mdp

Collaborating Authors

belief mdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Active Measuring in Reinforcement Learning With Delayed Negative Effects

Gao, Daiqi, Xu, Ziping, Rawashdeh, Aseel, Klasnja, Predrag, Murphy, Susan A.

arXiv.org Machine LearningOct-17-2025

Measuring states in reinforcement learning (RL) can be costly in real-world settings and may negatively influence future outcomes. We introduce the Actively Observable Markov Decision Process (AOMDP), where an agent not only selects control actions but also decides whether to measure the latent state. The measurement action reveals the true latent state but may have a negative delayed effect on the environment. We show that this reduced uncertainty may provably improve sample efficiency and increase the value of the optimal policy despite these costs. We formulate an AOMDP as a periodic partially observable MDP and propose an online RL algorithm based on belief states. To approximate the belief states, we further propose a sequential Monte Carlo method to jointly approximate the posterior of unknown static environment parameters and unobserved latent states. We evaluate the proposed algorithm in a digital health application, where the agent decides when to deliver digital interventions and when to assess users' health status through surveys.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2510.14315

Genre: Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Control Synthesis in Partially Observable Environments for Complex Perception-Related Objectives

Xuan, Zetong, Wang, Yu

arXiv.org Artificial IntelligenceJul-8-2025

Perception-related tasks often arise in autonomous systems operating under partial observability. This work studies the problem of synthesizing optimal policies for complex perception-related objectives in environments modeled by partially observable Markov decision processes. To formally specify such objectives, we introduce \emph{co-safe linear inequality temporal logic} (sc-iLTL), which can define complex tasks that are formed by the logical concatenation of atomic propositions as linear inequalities on the belief space of the POMDPs. Our solution to the control synthesis problem is to transform the \mbox{sc-iLTL} objectives into reachability objectives by constructing the product of the belief MDP and a deterministic finite automaton built from the sc-iLTL objective. To overcome the scalability challenge due to the product, we introduce a Monte Carlo Tree Search (MCTS) method that converges in probability to the optimal policy. Finally, a drone-probing case study demonstrates the applicability of our method.

artificial intelligence, machine learning, objective, (17 more...)

arXiv.org Artificial Intelligence

2507.02942

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Iowa (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Convex Is Back: Solving Belief MDPs With Convexity-Informed Deep Reinforcement Learning

Koutas, Daniel, Hettegger, Daniel, Papakonstantinou, Kostas G., Straub, Daniel

arXiv.org Artificial IntelligenceFeb-13-2025

We present a novel method for Deep Reinforcement Learning (DRL), incorporating the convex property of the value function over the belief space in Partially Observable Markov Decision Processes (POMDPs). We introduce hard- and soft-enforced convexity as two different approaches, and compare their performance against standard DRL on two well-known POMDP environments, namely the Tiger and FieldVisionRockSample problems. Our findings show that including the convexity feature can substantially increase performance of the agents, as well as increase robustness over the hyperparameter space, especially when testing on out-of-distribution domains. The source code for this work can be found at https://github.com/Dakout/Convex_DRL.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2502.09298

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Value of Information and Reward Specification in Active Inference and POMDPs

Wei, Ran

arXiv.org Artificial IntelligenceAug-12-2024

Expected free energy (EFE) is a central quantity in active inference which has recently gained popularity due to its intuitive decomposition of the expected value of control into a pragmatic and an epistemic component. While numerous conjectures have been made to justify EFE as a decision making objective function, the most widely accepted is still its intuitiveness and resemblance to variational free energy in approximate Bayesian inference. In this work, we take a bottom up approach and ask: taking EFE as given, what's the resulting agent's optimality gap compared with a reward-driven reinforcement learning (RL) agent, which is well understood? By casting EFE under a particular class of belief MDP and using analysis tools from RL theory, we show that EFE approximates the Bayes optimal RL policy via information value. We discuss the implications for objective specification of active inference agents.

efe, inference, information and reward specification, (12 more...)

arXiv.org Artificial Intelligence

2408.06542

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Add feedback

A Federated Online Restless Bandit Framework for Cooperative Resource Allocation

Tong, Jingwen, Li, Xinran, Fu, Liqun, Zhang, Jun, Letaief, Khaled B.

arXiv.org Artificial IntelligenceJun-12-2024

Restless multi-armed bandits (RMABs) have been widely utilized to address resource allocation problems with Markov reward processes (MRPs). Existing works often assume that the dynamics of MRPs are known prior, which makes the RMAB problem solvable from an optimization perspective. Nevertheless, an efficient learning-based solution for RMABs with unknown system dynamics remains an open problem. In this paper, we study the cooperative resource allocation problem with unknown system dynamics of MRPs. This problem can be modeled as a multi-agent online RMAB problem, where multiple agents collaboratively learn the system dynamics while maximizing their accumulated rewards. We devise a federated online RMAB framework to mitigate the communication overhead and data privacy issue by adopting the federated learning paradigm. Based on this framework, we put forth a Federated Thompson Sampling-enabled Whittle Index (FedTSWI) algorithm to solve this multi-agent online RMAB problem. The FedTSWI algorithm enjoys a high communication and computation efficiency, and a privacy guarantee. Moreover, we derive a regret upper bound for the FedTSWI algorithm. Finally, we demonstrate the effectiveness of the proposed algorithm on the case of online multi-user multi-channel access. Numerical results show that the proposed algorithm achieves a fast convergence rate of $\mathcal{O}(\sqrt{T\log(T)})$ and better performance compared with baselines. More importantly, its sample complexity decreases with the number of agents.

algorithm, fedtswi algorithm, system dynamic, (15 more...)

arXiv.org Artificial Intelligence

2406.07992

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > China > Fujian Province > Xiamen (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(2 more...)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)

Add feedback

Learning Explainable and Better Performing Representations of POMDP Strategies

Bork, Alexander, Chakraborty, Debraj, Grover, Kush, Kretinsky, Jan, Mohr, Stefanie

arXiv.org Artificial IntelligenceJan-20-2024

Strategies for partially observable Markov decision processes (POMDP) typically require memory. One way to represent this memory is via automata. We present a method to learn an automaton representation of a strategy using a modification of the L*-algorithm. Compared to the tabular representation of a strategy, the resulting automaton is dramatically smaller and thus also more explainable. Moreover, in the learning process, our heuristics may even improve the strategy's performance. In contrast to approaches that synthesize an automaton directly from the POMDP thereby solving it, our approach is incomparably more scalable.

fsc, observation sequence, representation, (14 more...)

arXiv.org Artificial Intelligence

2401.07656

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(12 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Intermittently Observable Markov Decision Processes

Chen, Gongpu, Liew, Soung-Chang

arXiv.org Artificial IntelligenceFeb-22-2023

This paper investigates MDPs with intermittent state information. We consider a scenario where the controller perceives the state information of the process via an unreliable communication channel. The transmissions of state information over the whole time horizon are modeled as a Bernoulli lossy process. Hence, the problem is finding an optimal policy for selecting actions in the presence of state information losses. We first formulate the problem as a belief MDP to establish structural results. The effect of state information losses on the expected total discounted reward is studied systematically. Then, we reformulate the problem as a tree MDP whose state space is organized in a tree structure. Two finite-state approximations to the tree MDP are developed to find near-optimal policies efficiently. Finally, we put forth a nested value iteration algorithm for the finite-state approximations, which is proved to be faster than standard value iteration. Numerical results demonstrate the effectiveness of our methods.

artificial intelligence, machine learning, position state, (18 more...)

arXiv.org Artificial Intelligence

2302.11761

Country:

Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Convergence of Finite Memory Q-Learning for POMDPs and Near Optimality of Learned Policies under Filter Stability

Kara, Ali Devran, Yuksel, Serdar

arXiv.org Artificial IntelligenceOct-25-2022

In this paper, for POMDPs, we provide the convergence of a Q learning algorithm for control policies using a finite history of past observations and control actions, and, consequentially, we establish near optimality of such limit Q functions under explicit filter stability conditions. We present explicit error bounds relating the approximation error to the length of the finite history window. We establish the convergence of such Q-learning iterations under mild ergodicity assumptions on the state process during the exploration phase. We further show that the limit fixed point equation gives an optimal solution for an approximate belief-MDP. We then provide bounds on the performance of the policy obtained using the limit Q values compared to the performance of the optimal policy for the POMDP, where we also present explicit conditions using recent results on filter stability in controlled POMDPs. While there exist many experimental results, (i) the rigorous asymptotic convergence (to an approximate MDP value function) for such finite-memory Q-learning algorithms, and (ii) the near optimality with an explicit rate of convergence (in the memory size) are results that are new to the literature, to our knowledge.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2103.12158

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Under-Approximating Expected Total Rewards in POMDPs

Bork, Alexander, Katoen, Joost-Pieter, Quatmann, Tim

arXiv.org Artificial IntelligenceJan-21-2022

We consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this -- generally undecidable -- problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.

belief mdp, pomdp, supp, (13 more...)

arXiv.org Artificial Intelligence

2201.08772

Country:

North America > Canada > British Columbia (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Monte Carlo Information-Oriented Planning

Thomas, Vincent, Hutin, Gérémy, Buffet, Olivier

arXiv.org Artificial IntelligenceMar-21-2021

In this article, we discuss how to solve information-gathering problems expressed as rho-POMDPs, an extension of Partially Observable Markov Decision Processes (POMDPs) whose reward rho depends on the belief state. Point-based approaches used for solving POMDPs have been extended to solving rho-POMDPs as belief MDPs when its reward rho is convex in B or when it is Lipschitz-continuous. In the present paper, we build on the POMCP algorithm to propose a Monte Carlo Tree Search for rho-POMDPs, aiming for an efficient on-line planner which can be used for any rho function. Adaptations are required due to the belief-dependent rewards to (i) propagate more than one state at a time, and (ii) prevent biases in value estimates. An asymptotic convergence proof to epsilon-optimal values is given when rho is continuous. Experiments are conducted to analyze the algorithms at hand and show that they outperform myopic approaches.

algorithm, node, pomdp, (17 more...)

arXiv.org Artificial Intelligence

2103.11345

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback